NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Measuring and Improving the Efficiency of Python Code Generated by LLMs Using CoT Prompting and Fine-Tuning

https://doi.org/10.1109/ACCESS.2025.3585742

Jonnala, Ramya; Yang, Jeong; Lee, Young; Liang, Gongbo; Cao, Zechun (July 2025, IEEE Access)

The burgeoning sophistication of Artificial Intelligence (AI) has catalyzed the rapid proliferation of Large Language Models (LLMs) within software development. These models are increasingly employed to automate the generation of functionally correct code, address complex computational problems, and facilitate the debugging of existing software systems. However, LLM-generated code often faces challenges due to inherent inefficiencies, including redundant logical structures, factually inconsistent content (hallucinations), and programming errors. To address this issue, our research rigorously evaluated the computational efficiency of Python code generated by three prominent LLMs: GPT-4o-Mini, GPT-3.5-Turbo, and GPT-4-Turbo. The evaluation metrics encompass execution time, memory utilization, and peak memory consumption, while maintaining the functional correctness of the generated code. Leveraging the EffiBench benchmark datasets within the Google Vertex AI Workbench environment, across a spectrum of machine configurations, the study implemented a consistent seed parameter to ensure experimental reproducibility. Furthermore, we investigated the impact of two distinct optimization strategies: Chain-of-Thought (CoT) prompting and model fine-tuning. Our findings reveal a significant enhancement in efficiency metrics for GPT-4o-Mini and GPT-3.5-Turbo when employing CoT prompting; however, this trend was not observed for GPT-4-Turbo. Based on its promising performance with CoT prompting, we selected the GPT-4o-Mini model for subsequent fine-tuning, aiming to further enhance both its computational efficiency and accuracy. However, contrary to our expectations, fine-tuning the GPT-4o-Mini model led to a discernible degradation in both its accuracy and computational efficiency. In conclusion, this study provides empirical evidence suggesting that the deployment of high-CPU machine configurations, in synergy with the utilization of the GPT-4o-Mini model and CoT prompting techniques, yields demonstrably more efficient and accurate LLM-generated Python code, particularly within computationally intensive application scenarios.
more » « less
Full Text Available
Machine Learning-Based Vulnerability Detection in Rust Code Using LLVM IR and Transformer Model

https://doi.org/10.3390/make7030079

Lee, Young; Boshra, Syeda Jannatul; Yang, Jeong; Cao, Zechun; Liang, Gongbo (September 2025, Machine Learning and Knowledge Extraction)

Rust’s growing popularity in high-integrity systems requires automated vulnerability detection in order to maintain its strong safety guarantees. Although Rust’s ownership model and compile-time checks prevent many errors, sometimes unexpected bugs may occasionally pass analysis, underlining the necessity for automated safe and unsafe code detection. This paper presents Rust-IR-BERT, a machine learning approach to detect security vulnerabilities in Rust code by analyzing its compiled LLVM intermediate representation (IR) instead of the raw source code. This approach offers novelty by employing LLVM IR’s language-neutral, semantically rich representation of the program, facilitating robust detection by capturing core data and control-flow semantics and reducing language-specific syntactic noise. Our method leverages a graph-based transformer model, GraphCodeBERT, which is a transformer architecture pretrained model to encode structural code semantics via data-flow information, followed by a gradient boosting classifier, CatBoost, that is capable of handling complex feature interactions—to classify code as vulnerable or safe. The model was evaluated using a carefully curated dataset of over 2300 real-world Rust code samples (vulnerable and non-vulnerable Rust code snippets) from RustSec and OSV advisory databases, compiled to LLVM IR and labeled with corresponding Common Vulnerabilities and Exposures (CVEs) identifiers to ensure comprehensive and realistic coverage. Rust-IR-BERT achieved an overall accuracy of 98.11%, with a recall of 99.31% for safe code and 93.67% for vulnerable code. Despite these promising results, this study acknowledges potential limitations such as focusing primarily on known CVEs. Built on a representative dataset spanning over 2300 real-world Rust samples from diverse crates, Rust-IR-BERT delivers consistently strong performance. Looking ahead, practical deployment could take the form of a Cargo plugin or pre-commit hook that automatically generates and scans LLVM IR artifacts during the development cycle, enabling developers to catch vulnerabilities at an early stage in the development cycle.
more » « less
Full Text Available
Analyzing the Usability, Performance, and Cost-Efficiency of Deploying ML Models on BigQuery ML and Vertex AI in Google Cloud

https://doi.org/10.1145/3694860.3694863

Wang, Hongyu; Yang, Jeong; Liang, Gongbo; Lee, Young; Cao, Zechun (August 2024, ACM)

Full Text Available
Board 267: Enhancing Urban Mobility: SmartSAT's Impact on Public Transportation Services and Commuting Experience

https://doi.org/10.18260/1-2--46840

Yang, Jeong; Lee, Young; Abdel-Rahman, Mohammad; Cao, Zechun (June 2024, ASEE Conferences)

Full Text Available
Multi-Scale Self-Supervised Consistency Training for Trustworthy Medical Imaging Classification

https://doi.org/10.1109/EMBC53108.2024.10782322

Han, Bonian; Moran, Cristian; Yang, Jeong; Lee, Young; Cao, Zechun; Liang, Gongbo (July 2024, IEEE)

Full Text Available
Board 209: Bridging Language Barriers in Healthcare Education: An Approach for Intelligent Tutoring Systems with Code-Switching Adaptation

https://doi.org/10.18260/1-2--46776

Cao, Zechun; Zavala_Villafuerte, German; Jalooli, Ali; Balyan, Renu; Moosavi, Sanaz; Iacobelli, Francisco (June 2024, ASEE Conferences)

The recent rapid development in Natural Language Processing (NLP) has greatly en- hanced the effectiveness of Intelligent Tutoring Systems (ITS) as tools for healthcare education. These systems hold the potential to improve health-related quality of life (HRQoL) outcomes, especially for populations with limited English reading and writing skills. However, despite the progress in pre-trained multilingual NLP models, there exists a noticeable research gap when it comes to code-switching within the medical context. Code-switching is a prevalent phenomenon in multilingual communities where individuals seamlessly transition between languages during conversations. This presents a distinctive challenge for healthcare ITS aimed at serving multilin- gual communities, as it demands a thorough understanding of and accurate adaptation to code- switching, which has thus far received limited attention in research. The hypothesis of our work asserts that the development of an ITS for healthcare education, culturally appropriate to the Hispanic population with frequent code-switching practices, is both achievable and pragmatic. Given that text classification is a core problem to many tasks in ITS, like sentiment analysis, topic classification, and smart replies, we target text classification as the application domain to validate our hypothesis. Our model relies on pre-trained word embeddings to offer rich representations for understand- ing code-switching medical contexts. However, training such word embeddings, especially within the medical domain, poses a significant challenge due to limited training corpora. In our approach to address this challenge, we identify distinct English and Spanish embeddings, each trained on medical corpora, and subsequently merge them into a unified vector space via space transforma- tion. In our study, we demonstrate that singular value decomposition (SVD) can be used to learn a linear transformation (a matrix), which aligns monolingual vectors from two languages in a single meta-embedding. As an example, we assessed the similarity between the words “cat” and “gato” both before and after alignment, utilizing the cosine similarity metric. Prior to alignment, these words exhibited a similarity score of 0.52, whereas after alignment, the similarity score increased to 0.64. This example illustrates that aligning the word vectors in a meta-embedding enhances the similarity between these words, which share the same meaning in their respective languages. To assess the quality of the representations in our meta-embedding in the context of code-switching, we employed a neural network to conduct text classification tasks on code-switching datasets. Our results demonstrate that, compared to pre-trained multilingual models, our model can achieve high performance in text classification tasks while utilizing significantly fewer parameters.
more » « less
Full Text Available

Search for: All records